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SIS. 

<i:i,i':>com.i-(:.^ti^(7)X'hhztimit-tm 

& ^Cffl V ^-S. ^^tg^l*lgBffiBiJS:^l#-r fc ^ffi 
fc-ri. 1 =5rV^ U 4 <^V ^iYL*> 1 ^(cfE®cO%gS^ 

[11*^6] lAOhcOffiffl^comtttS^gSrffi^L, 

m^'iaimmizm-^^ ^x . mmm\. i x/^com^nci'- 

m&<7)^mi:^-tll,z!g?.LX. mlE^gSiTJt^gS^N" 
5>-:J't=t¥-oT»^jtv^±J J:l^<7)iffl^cOii'-^:< i: t 

[ it^« 7 ] 1 xiiUiffmm^comitmmirm^ t . 

mim^^fitz[^mi:^t^l.z^LX. mi^^ix 
Jt^iSv N- 7 > - rJ' o T msKJi V ^^J J: cOiJ-- 
=2r< k t-:^5r^SL-o-5^F-rs:: k 5r!^ffii:-rS?6 



tOOO 1 ] 
[0002] 

;^(^J;'l»^5^^:7>f^;^jL"if>f y^'^x-;^ (GUI ) 

[00 0 3] GU Ii7){?:C*-l)>f :^5':7x"-Xi:LT 

m<^mii%mLfzr> . ^m<n^^^>ytp}£imm^ 
h fz^<Dn^mmm 'J^x'h h . ^tmrn^iZi: hn 

iz±%tp^t^hhfz^. :iixh^lLmzVmLfzhm^^ 

'ftnfz^^h^t\mL<. mm^km^Lx^^nth 
[0004] -If. Ariico^mzit^^^comzhm^^ 

^miZ:}5\^Xmm^j:®^i:m^X\^'^hZtf)m^tlXi^ 

mMJ:rn>m.m^'oxmLfzUi)^i-^<^<i>tA 

[0005] :ltl\,zAm<7>Xot^:m'^^'k^' 

mm^(n^^-v±-tLxhfztf^i^%m'^'^)W'<- 
(r>i:^\izm<\^<^inxi\L^y^y :^-^tnJi^. 

y:5^7x->?.{iUIKWAr^c7)J; a=5:?tm'^< . ffi 

^> JOjH LT < fl^ g V ^ 3 y t: jL - ^ co^-e ^ t COTJ) 

x>^hftffii:B?^o 
[0006] 

i^y :t.-x/^}V^^-^)V^ y^y 3i-x/mMt^ 
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S>*). S*M=5:iZaS/-&B!t«l«lOtttg|S]Jb^*tS^T;l' 

[ 0 0 0 7 3 !f$(:Hs6ffl*S:fflA t LTffi ^ W y ^ 7 X - 
ix#*> 0 tWiS 1gaiO<3Effl# Sr ffl^ (cfgE Loo t m- 

IS1-05X- i/x y h era t -C (italic Jt? t T WlUl V 

wmx'hh. z.ixt><r>3i-i;:LyYt^mm^i,zmtx^ 
x.hztt^X'^h<r>ii. miL\i. ^r>x\^hMz\i ^fc 

lii e rs'i%t-r. oo$^J t*^ a^.'Srv^Ac 

$f5fecox-i;xyhJi, ««.i*i«*<|sil:5:<i>tfAA 
tc*f LT t ^^{3*t LT t |SI tm^v ^ b m 

ti^hm-ofz^. ^hhi^\-^x^':,tz^-ti:.tx'm\.z 

KL*»{tJ: 9 i: LX\^h<^t^i:mi^^z^t,^hZt\.i 

(0 0 08 3 tfz. mmk\-^hp<r)m=f-i:^th:iti.z 

[00093 *%BBW. ISSMm^lS-LX^Ji^fifzh 

<r>x\ mm\,zmiL<r)m^bmLtz^^i.zi,. wm 

ii-L*^7ti— >'^7x-x5:||3S^«g:5:?6g5^ga 
too 1 03 ^^mi. i-lf'f >-:?7x-x« 

lcs^fi.tt^&^tlt5r^{c^Ti?Ltf>2.i torse's: 

[00 113 



[wmi:mm-fht:»)<r>^m (W^^ai ) ^zm 

:^?rS3gLoo^-r-5»IS#gi; $rm<iUfcC: t Sr« 
[00 1 23 (i«*«2 ) t;:^2.|^iSgS{i, i 

^mn^m^^^t, miM,'mm\,zmn\.^x. mm^ 

< t t-:fi-SrSeSLoo?&?&-f-S%K#St SrftfilLTt 

[ 0 0 1 3 3 »^ t < {i. frfSJStttWfiJi. fieffl«Ott 
SI ^F!I&±J ilXvC^WSSSlO p t>Oi!'-^: < 1 1 1 oeoffi 

tc^rJ:>-5-CtJ:V\ 

[ 0 0 1 4 3 » ^ L< (S. tllitv^fc.kt^tfOli^cOii'- 

i:»^i-mizm^^t^s^m^j:mmmi:mn'r 

hJ:ol,ztXi>^\\ 

[00 1 53 (is^^6 ) i 
AJiUicOffiffl^cOJSttffi^SrJt^L. BtneSttWffilcS 

gS-r^lcPSLT. mz^^^itfc^m^-^y^-i^iz^'y 
of6?&-r-2.C:i:S:^$Sti:-r&. 

[00 1 63 ( w*«7 ) ^zm^1^m^i. i 

AWJicoi3Effl«^iai4ttffi5-ii^t. frfBSttW^fit^S 
■»^5iv%t3j:W9<^P^<Oii'-=6:<i:t-*?riigfi5-t 

[00 1 7 3 ^^mzitiii. ffiffl#t<oa^«iiiicj: 
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pcom^Fizx Kimmmzm^miStt^iibiztt^ 

'^P<r)m=FT'S^^ii(on^imti}^ifhtih z t izx 
miiizm'yfzmmm\^'^-pcnm^izmm^t:!&t. 
[00 1 9] ^i^c. *i6BBt:-«i. ±fec^m^v^j?>^<^ 

[00 20] J: oT, SISS^^lctr)-?:{i. JJE<^)W 
[00 2 1 ] ^rfc, ah<7)#^mt=f^SI?&HB{±:^t« 

mmttxij^L. ■jjmizmh^ma^miz^:^^ 

)f ft S V ^li^^gSr n >- 1 J.- im^rS ii-STts^iOro 
[0022] 

x-xivhsiii. m}m^j:iKmmm.miz^^^ (t 
Ltz{,<r>^m\,zb -iX^mmm^^mir^. 

[0023] *%iSII{It> U < li^fS«8tg»iy 7 h X 

nm^^xmmmx'h'o . *^^gtL<{i^is«a 

[ 0 0 2 4 ] *»Bfl^-^Sfijesit:<$S^%igSIS 



3. l6igg?4«-{i;c.TV^«,. 

[0025] fiEffl##ia*S«^ 1 »4. <£ffl#?rfli« 13^ 

[0026] %ffirts^«Si!2»4. ffiffl*#sas«a! 1 

?6i: Atl{aaffl?8<0^T V ^li^-O-asS: 1 1 t 

X. b'iD^mmizmmtii^im-t ^ms^mmims 

[0027] %R^'?7^-:5'^«a53«4. 
^SPi (ti Ofi«$ix/::^^^*oA%mttiff8lcE 

[00281 ^iSgS4 f6fSrt^^gP2 -C^^S^I 
Ml'-^^ia (JISt3#^-rSAJg^c:ixtffl^-rst 
4i-*<S^-Wl::$iJ^I£5rJ^l::J4A}^<Olir6l# 

[0029] <j:c, ^mtmmizi^hmmmi.zin'r^h 
m^m^j:mmiz':>\,^xm^^h. ii2ic, 

y-si<7)m.m^wmmmmx-ii. ^mm^mzm^ 

^fS/'C^MSEfS'SriftfOA^^JStttf ?8S:}6^-r«. t t 

iz. iiim^j:ii>^mm^<mm.i:m-rA^<immc^^ 
m^(o^m^j:b'^^-rA^msmm {^^Xmi:7rcm 

[0030] Xy' yrS 2(7)^mf^mi^mX'l,i. X 

r-vrsi izxpimixfz^mm^coA^^^mmbA 
^m^mm b xmiimmmcrt^x h s i Mi^^y-^ ^ t 
bizLX. ^<r)^^i.zmmti)-i:m-^^^^m 
mb^mn^imii^m-h. 
[003 1 ] xy-'vrs 3<7)^f^v}<—S'^^5mx' 
\i. x^'v.rs 1 i,zxm^^ixfz^^mM<r>mm'& 
mzmttz-mmm\-^iiiiif/t fz\t^(m=Ft:mm-t 

[0032] ;^-f-yrS4c05Ei5§*QiST'J4, Xf-vTS 

2 ^^T*s$^^./s%K«^#^S$fi^=«-5•c^fi^s^^>x- 
•J:^yv BiHtfeoatfi] # ( SStc^Bi-ri. AJg^i 

icM-r h simmttfi^f-mzmm'^mti:^-^ \,z\iAmm 
isi^rt') ^um-hbbi>i.z. :^T-/rs2izxmM^ 

flfz'm-t::^^'vTS3iiZX^%^fitzm^^-^yy^—!> 



(5) 



#^¥1 1-1 7508 1 



[0033] arx-ii. mm^mm^^ i , i^isrts 

sitStit^SS 1 (CO V J: 0 B L < imt h . 
[ 0 0 3 4 ] 03 tc, ^m^^mm^'Si 1 <7)ti^«FiJS:S^ 
■t. 03tc5^$n6J:3l2, ::£0isEffl*1f®[JS^g?l 
(i. A?teJtK^§ 1 1 . KVa^^m 1 2 , K^m.-^m^ 
1 3 . AiftM-^ 1 4 , ^i:^e<)SEiif}t^gi5 1 5 . egiM 

-^S 1 6 , mm^U 1 7 . ^gfeHg^iS* 1 8 . ifSB 

[0035] K^i^^mm 1 1 K^^'&mT-yyx^ 
[0036] xm^'^mm i sm. mKim-^-thtz'f> 

Wk^^ 1 4 A%«!mg8 1 2 1 J: •) ^#^>^^;t AftIS 
lSH«S:±ie<0AttS3^ffl-r ^-T-U— h t.^-^LTfiS 

-btcWJ£#{t'?>n^cA!a^S'Jtt$gS:ai:»J-rS. ^fc. 
A*!!lSS-^gBl 4«i. AiilfflfeSa^SrAtlHg^fflTyru 

^j:t^r,tz^'^i.z\i^^Am:hi c: t Sr^-f At*) 

[0037] 'iMViwmmM i s ±feoA!feiSES'] 

srsifr-rsfctttc. w!^^ixtzm^mmm&.\,zmt 

tz'mmm^n)^Lx^fiir^. :L(r>'\jmmm\tz 

<ols^i:'i^<'^>v^^s^c^aL^^i*»s•5^■rfflT'^)0, si 

mmmznLxmim^-thm.. mm. ^m.mm 

m.i.z^imfhs.x'ibh, 

[0038] mm^m^ 1 6«. usyBs-^fflxyru 

-bSr?Stfrt«.. ttlS'JJt^SBl 7<i. AftlSaiSl 2tc 
J: Oit'5*T.JtA!B!l«lSa«S:±ieottS'M-&ffixvru 

- h fc Lxmmzmm- h r-yru- h^msit. 

[0039] ^m-^mm i at*. ^^m-^mT-yru 
-himm-i. ^^m^^i9ii, xmkia^i 2iz 

[0040] ^±3. tt^iJJft^aSl Hi. A^m^uiA 

tmmiz. m.m--:>-^i,zmm-hy-yriy-hi}^^ 
x'^ti^-^(om$i'r yyu- h iznmi-f^tifimni: 
tatit. +mzmm-h'ryri'-hti^m.T^Kci)^':, 
fzm^ii. mw^x'h^zki:^'tmmi:tb:fj-r?>x 



+^\,zmm-i>'ryrv- vt^%^x-tt:^^<r>mcf 
yru-h jEf+ Jt (>titz^m i:iiiijL. +^i,zmm. 

x'hi>zti:7f:i-mm?:tii:t}-r^Xoi,zLxhX^\ 
[0041 ] ^rfc. 't-mfi^mmm^Sii 5<r)miTti>'C- 

1 9CDtiitl-th^t1fii-Kciy-hAmKit^mX'hi>, 

mmiey^m^mmi aizt&m^tih'ryyi^-v 

m) miz+j^mL^^tLi-zymmii^t^m^mz^iibhti 
h^:^7-zf'j<oi^i:mmLtimmx'hh. 

[0042] iJctC, l^iSl*l5if;9lSgP2t-OV^Ti'?PL 

< mm-Th . 04 t^isrt^^^sp 2 (^m^m^irct. 

m4izi^^h.i,Xoi,z. c:co»iSrtS^gB2li, ^Jffli 

^K-x2 3. nm'f-^^-^2 4. ^mmiR^ 

2 5i}^(>m&i.^tLi>. 

[0043] $ijffli«iijT-^"^-x (mmmmD b ) 2 
m&umsi2 2ti. ±mcommmiiziitr,x^<7) 
J4. mm. roos^^^m»s^•^^•J;J t*» r^asri^ 

[0044] StR«SlJx-^"<-X {MimmD B ) 2 
•rs. iSliT-:7'<-^ (^jSDB) 2 4W, mrT^ 
[0045] l6art?gSltRa!25{4. ^aDB24*> 

^smm^2 2<7)ai:hi-hp^^mim^iz^^m-h 

l*l$Sr, 3!S?«I"JD B 2 3 CtS*ft$fLSaiR«fllJCSr? 

[ 0 0 4 6 ] i mmmiD b 2 1 i:^s*iwi.$fj« 
mHz'^\'^xmm-ti . $ijffli^iU4:^iwi. t^gstiBS 
^ij t ^^n^sf\<^ 2 ^ & . mmti^^<m 

[0047] mmz. msm^wnz-:>\^xmL<mm 
■tt. ^mmiasmm. ^wmmi3'mmzi^mmi:miii 

<OX'hi>. 

[ 0 0 4 8 ] 05 ^mm0mwico-mi:m-t. 05 
i,zm^-t:^m^m^mx'ii. #*>stm®*>t>flyB* 

m^ti. ixizmmm^ (hh^^mi^^comm) iziat 
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[0050] ^J=ri^iJ{C{±, ^M^ifeWflcMffl^il 

m^hm.i,zm$Lxvp^. mmt=7yyj^^zmm-h, 
1 0 0 5 n 06 ^K*f^«aij<7)^ *>oggi*3s«[Hi« 

rai6l^ElHl^SIiJ<7)5SfflSri|^TtT*f|{CA 

I.. 

[ 0 0 5 2 ] $T . *SICAI. i: +^[H]SIIiJ*<3Sffl$ 
til.. I^RWIfe^iJoafe+Siaill^IiJc^-W 
S-^-f . ll7{zeiiJ5^L;t+8IICIl]J|giJT'<i. aS(Cji^ 

A^ttl.. fSSCJi^^fitSUJtAifeOXrT-d-U (tt 

$:H-^-t&. SJt, *&A!|?!lAS^iSilS^cO«ffiW 

<- ^6 ^■&lC«4KA??!l2:Sft'C;»^ A!B5TJ) S fc ffj^ LT Sft^D 
JS ( 0/1 ) ^Iticf »:(=3i^gi:SSi.|i,jg<7)iDa*l 

sri^iHi-ri.. \x<r>K^i:nnt.b-rh^m\iwm-h'^ 

[0 0 5 3] <>:t, mmS\\T>h2 3{.zl&mth^WM 

m^z-y\^xmiM■h. mwm\i. n^mm^^^z^-, 
x^p^^imm-thf.zih<r>m\x'hh. lastcsiRS 

[0054] EI8lce»J5^LysSJR^iJT'»S, m^^. 



[0055] mz. ^SD B 2 4 tCtStt$ixl.t»l8{:-:> 
V^TKHfl-tl.. iSSDB2 4<0rtS{iSiiW^«^»r^ 

fiaijW^lSSSSi: LTjiffl L/ciS^^eHD B 2 40 

[00 56] mmffimi 
o$^. J 

«^=*»iA!t5j : mil=mitk^ : :*:= Uita^^ t 
X. mix-xx-i-. J 

*fa=^;*x:i'U : mm=I^TmW : X= r if p t J> 0 
*r^=^;«Td-y : SSiJ=+Bf^» : :*:= rfcttLv^w 

*fiK=AA : m^i=^m : :SC= -#-J («f 
: aS'J=*li : ^C= -J (Sir 

«^=;*;A*>o^tt : aS!l=*II : X= r::c7)*lc<0ii?rfe 
%^.<^%t1)-t:,^^'iX\^h. ^pv§mm>25 

\t. mmm<r>mmi.z^r>x^!ibt:,fih^m<nntiM^: 
\-^mmimmf^-fiio\,zm< . ^v^m^ntzmmiz 
mx'hh. 

[ 0 0 5 7 ] 7t/iL, *»A!|%lc7)^tc{i, *f#S=Bt»l 
[ 0 0 5 8 ] =5:t3, mmD B 2 4 HM*^ 

[0059] mz. 56iS>'N'7>-5'^«S53C-:>ir>T J: 
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[0060] mm^mT-i'^-x m^ito b > 3 

[0061 1 fflJS'J^ttr-:?^-^ (li^'MttD B ) 3 

^iS'J^^•7;<-:5'i6^gS3 4{i, Si^N'^^-rJ'iS 
^S»3 2coai:']-r&SJ^N*7P<-:J'ffi#S2:S{t, ^lixlC 

■i-iSjiv^w7 7>-:7S=5o (mmm) 
ig-rjs§= 5 0 (s^ii) 

m^^^vv-y'PK +50 («i«»T<7t'ft3tiSttcii^) 
iSfiS? -30 (•5>r.<'5U5tJStlCiSS) 

^op^c^ffiL^a ±0 mm^^) 

ig-rjS$ ±0 (SJPffliMB^) 

a^v%<7)7 7Vi5'« ±0 (SJi<i*tif) 

?»cO|ffl^^<OfllL$S +2 0 (-^-^UV^SttPS) 
iS-rS6$ -3 0 (t9>o< OU^iSttlBS) 

•■•^uio^^y^'S +5 0 <,wfiX<fivttzm.Kzm^') 
^<7mf<n'SL\.^m. +0 (liJtsfiiiMH^) 
is-rs? ±0 (g^Matif) 

['i:«WiEM<O^V^Aai6]tt«2|6iBlE-fe -y h ] 

mwk^^^yvy^K -5 0 (fiiy>Tj>f>3tt-5/v:®i:{^i@m) 
is-^-jis ±0 (wmmm) 

I. . ::ixSr»:K<0ligiJ>''?9^ -^'ift^^lP 3412^0.^? [00643 iigmtt<^:ft«s{fiJS:WTtc^-r . 

l-«3tl^<7)7 7V^'* +10 (it{cStLTt</i{t7t!StT'i«&ffll»l) 

l&op^offiLSK +10 (iii:*tL-ctffiL<i^^{lrfii) 
fg-TiS^ -2 0 (fiC«LTtt9>-o< OiiS-r<il*]) 

if (*f^***^) then ?Scoifl^ ofg L $ +10 



«D B 3 3lc*Stt$ill.Wmi4{ciS{il6]SrJD;t^c 
[0062] liJ^^ttD B 3 1 (C« 1 

\t. i^e^iaj{tco«JtiPIE-ir-/b^*tti6]{to^J|i|)IIS-fe 
•yhS-SJ^^ttiL, ^ii(zS-:Jv^Ta*^>'9;'<-:J'-t:-yb 
[0063] a3ts>'^•9;^-^ir -y h fcSSppS-fe .y h<0 
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i: -C' . *ii<o»ffl;%^T^i:<7)<H4 tn-tizt 

[00 6 5] <ji:ic. mimiz'yK\xxmi<mm 
01 oiz. ^m^Acom&miTF-t. mioizm^ 

4 1 . ms:$:^m^4 2 , js^mmm^^m 3 , mnm 

[0066] x^mmw-:^'<-:^ {x^wmm 
B) 4 m&f^^m^U2(r>ii!.:nthmi^^mm 

mm&\'^coxiz^^-thfciibcom]it&m-&. 

[0067] m&:SC^mB4 2 li. ±ieo%|gF*l$tS$g 
S:f6tSl*lS^a5 2 coaJ:^3-r S %iS^^•5 ^ - -5 T 
±fe<0:S:S!^«fllJ D B 4 1 $r#Bg Loom^OWSitV ^ 

ff)Xi,z^m-ti>, 

[0068] M.'^mi^SiA 3(S. ^fS:S^ffli4 2 
<oa:J3 -ri. ffeSXIf ?8$- ±ieo%ii§A-7 > - o T 

^m^mmizmtfzmmym^. axv^^m^Hmm 

[0069] ^:^fffB^Sf 4 SJi, ^?SSr^ai:Jj-t 

&ttuz. M'^piznmLtinmmm i:W.Mmm^ 

s£S84 4 (iKiai^fflfglcJS tTx-i/x >- bcoPco^W 
[0070] ^rfc. ::<Ofc ^%iS^^#ttffis 

'tmffjwm= 6 0 

Mfm^^cr)yyy{'&= lOO 
^iOp^^cOffiLS = 1 00 
iSf5SS = 4 0 

Cfefis^ttsffi COM] 



tl^hh. mi UCW>ftASfliJ«O-0ySrSrf. SI 2lC 

[0071 ] ±tzmwiumm^\^^^mizn^ 

[0072] :scM^m\tm^'>mz.mm^crmmi,z 
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JPO and NCIPl are not responsible for anydamages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original precisely. 

2. **** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 

[Claim(s)] 

[Claim 1] A user description presumption means to presume one or more users' attribute information, and an voice 
synthesis parameter decision means to determine the voice synthesis parameter for adjusting at least one side of the tune 
of wording and voice based on said attribute information, voice synthesis equipment characterized by providing an voice 
synthesis means to utter facing uttering the predetermined contents and changing at least one side of the tune of wording 
and voice according to said determined voice synthesis parameter. 

[Claim 2] A user description presumption means to presume one or more users' attribute information, and a contents 
decision means of voice synthesis to determine the contents which speak based on said attribute information. An voice 
synthesis parameter decision means to determine the voice synthesis parameter for adjusting at least one side of the tune 
of wording and voice based on said attribute information, voice synthesis equipment characterized by providing an voice 
synthesis means to utter facing uttering said determined contents and changing at least one side of the tune of wording 
and voice according to said determined voice synthesis parameter. 

[Claim 31 Said attribute information is voice synthesis equipment according to claim 1 or 2 characterized by being a thing 
including at least one information in a user's sex, age, and psychic distance. 

[Claim 4] Said voice synthesis parameter is voice synthesis equipment given in claim 1 characterized by being what 
consists of at least one amount in the tenderness of the tune of voice, and the rate of voice synthesis thru/or any 1 term of 
3 whenever [ flank / of wording ]. 

[Claim 5] voice synthesis equipment given in claim 1 characterized by holding the internal rule which is used in case said 
voice synthesis parameter decision means determines said voice synthesis parameter from said attribute inforrnation, and 
which can be changed in order to change one [ at least ] inclination of the tune of wording and voice into arbitration 
thru/or any 1 term of 4. 

[Claim 6] The voice synthesis approach characterized by uttering presuming one or more users' attribute information, 
determining the voice synthesis parameter for adjusting at least one side of the tune of wording and voice based on said 
attribute information, facing uttering the predetermined contents, and changing at least one side of the tune of wording 
and voice according to said determined voice synthesis parameter. 

[Claim 7] Presume one or more users' attribute information, and the contents which speak are determined based on said 
attribute information. Based on said attribute information, determine the voice synthesis parameter for adjusting at least 
one side of the tune of wording and voice, and it faces uttering said determined contents. The voice synthesis approach 
characterized by uttering changing at least one side of the tune of wording and voice according to said determined voice 
synthesis parameter. 



[Translation done.] 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the voice synthesis equipment and the voice synthesis approach for 
changing the wording of a machine and the tune of voice of talking with human being especially with natural language, 
about the user interface which converses with human being by making voice into a subject. 
[0002] 

[Description of the Prior Art] The mechanism in which the interaction of the machine and human being of generally 
being as receiving a response from a machine **** [ and ] is treated is called a user interface. [ ordering a machine ] 
Especially the user interface is important in the field of a computer, and practical use is presented with the command line 
by the keyboard, and the graphical user interface (GUI) by the mouse from the former. 
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[0003] as the interface which comes to the degree of GUI human being's talk ~ the voice interface which can be 
ordered by means of language is studied, and many things are proposed. A speech recognition function for a machine to 
recognize the language about which human being speaks in a voice interface, and a speech synthesis function for a 
machine to address to human being are indispensable. In addition, the dialogue control function for controlling the flow 
of conversation or controlling the timing of voice synthesis etc. towards the predetermined purpose, is also required. In 
the dialogue by natural language, since there is very big width of face about the class of an expression or word, and 
advance of a dialogue, these are recognized correctly, slack usage part beam squirrel ****** is difficult, and current 
utilization is being carried out by limiting subject and stopping width of face. 

[0004] It is known that the visual expression of expression, a gesture/gesture, etc., etc. other than voice exists in human 
being's conversation, and this is aiming at important work in conversation on the other hand. If the direction said as the 
direct meeting rather than the telephone comes exacdy, that people sense also originates in this. A multi-modal interface 
enables it to treat two or more different expressions (modality), such as such voice and expression, integrative. This 
multi-modal interface recognizes / interprets voice, expression/gesture, etc. synthetically, and even if ambiguity is shown 
in each expression, it has the description that an exact interpretation can be performed on the whole. 
[0005] Furthermore, the face and voice like human being are given to this, and what works just like a secretary or a 
helper as a user's partner is called a personification interface. This interface is the figure of the wise computer processed 
while appearance interprets an instruction of a user and assembles [ not only like / appearance / human being but ] a 
required procedure by itself itself. The technique of treating the cleverness (autonomy etc.) of this machine is called an 
agent technique. 
[0006] 

[Problem(s) to be Solved by the Invention] All voice interface / multi-modal interface / personification interfaces that 
were described above are interfaces which talk by making voice into a subject. However, these are the things under 
current research and the present condition is that the quality for which the fundamental improvement in the engine 
performance of recognition/composition function and examination of a dialogue model, and the improvement in the 
engine performance of a user's authentication and recognition of expression/gesture press actual human beings' 
conversation by having been a main technical problem is not easily realizable. Although the place which deals with each 
element is difficult for this, a thing called human being's conversation contains very various elements, and it is because 
many elements which have not been carried yet technically remain. 

[0007] About the interface which treats especially a user as an individual, the system in which attest a specific user and 
the interface agent only for users is made to appear, and the system to which a single interface agent is made to do a 
partner although two or more users who interchange and visit are attested separately are proposed. Although it is possible 
for the former to change the tune of the wording according to an individual or voice for every agent, about the same 
agent, the tune of wording or voice is unchangeable suitably according to a partner. As for this, the same is said of the 
latter, these agents can change according to a user — for example, it is the direction of contents which say to those who 
know at "Mr. good morning and 00" and those who do not know, "how do you do" and which talk. That is, if the 
conventional agent has the the same contents which talk, he can only do talldng at the tune of the same wording and voice 
also to a child also to an adult. Consequently, although the conventional agent can inform a user of whom it is going to 
speak to by talking after calling an identifier, or turning to and talking there when making two or more users into a 
partner at coincidence, it cannot talk now by using the tune of the wording and the voice according to a user properly, and 
a user cannot be told about ** or who a ******** partner is. 

[0008] moreover, the thing for which the tune of wording and voice is changed - in addition, although it was thought that 
a user heard these contents kindlier when the contents which talk could also be changed for every user, such the 
synergistic effect was not expectable to the conventional interface agent, either. 

[0009] This invention aims at offering the voice synthesis equipment and the voice synthesis approach the user interface 
which make each user know whom the machine is speaking to, or this user was made to have a sense of closeness over a 
machine, or sensed felicity, cheated out of it and had it in the user, and made a feeling of use improve also when it was 
made in consideration of the above-mentioned situation and two or more users were faced at coincidence is realizable. 
[0010] Moreover, this invention aims at offering the possible voice synthesis equipment and the voice synthesis approach 
of making different characterization for every user interface perform easily. Furthermore, this invention aims at offering 
the voice synthesis equipment and the voice synthesis approach which made it possible to also change the contents which 
speak by the user. 
[0011] 

[Means for Solving the Problem] A user description presumption means by which the voice synthesis equipment 
concerning this invention (claim 1) presumes one or more users' attribute information, An voice synthesis parameter 
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decision means to determine the voice synthesis parameter for adjusting at least one side of the tune of wording and voice 
based on said attribute information, It is characterized by providing an voice synthesis means to utter facing uttering the 
predetermined contents and changing at least one side of the tune of wording and voice according to said determined 
voice synthesis parameter. 

[0012] A user description presumption means by which the voice synthesis equipment concerning this invention (claim 
2) presumes one or more users' attribute information, A contents decision means of voice synthesis to determine the 
contents which speak based on said attribute information, An voice synthesis parameter decision means to determine the 
voice synthesis parameter for adjusting at least one side of the tune of wording and voice based on said attribute 
information, It is characterized by providing an voice synthesis means to utter facing uttering said determined contents 
and changing at least one side of the tune of wording and voice according to said determined voice synthesis parameter. 
[0013] Preferably, said attribute information may include at least one information in a user's sex, age, and psychic 
distance. Preferably, said voice synthesis parameter may consist of at least one amount in the tenderness of the tune of 
voice, and the rate of voice synthesis whenever [ flank / of wording ]. 

[0014] In order to change one [ at least ] inclination of the tune of wording and voice into arbitration, you may make it 
hold preferably the internal rule which is used in case said voice synthesis parameter decision means determines said 
voice synthesis parameter from said attribute information and which can be changed. 

[0015] The voice synthesis approach concerning this invention (claim 6) is characterized by uttering presuming one or 
more users' attribute information, determining the voice synthesis parameter for adjusting at least one side of the tune of 
wording and voice based on said attribute information, facing uttering the predetermined contents, and changing at least 
one side of the tune of wording and voice according to said determined voice synthesis parameter. 
[0016] The voice synthesis approach concerning this invention (claim 7) presumes one or more users' attribute 
information. Based on said attribute information, determine the contents which speak, determine the voice synthesis 
parameter for adjusting at least one side of the tune of wording and voice based on said attribute information, and it faces 
uttering said determined contents. It is characterized by uttering changing at least one side of the tune of wording and 
voice according to said determined voice synthesis parameter. 

[0017] Because according to this invention presume a user's age / sex / psychic distance and a machine changes the tune 
of wording or voice on the occasion of the dialogue by natural language with a user according to this presumed result 
Also when equipment faces only one user and coincidence is faced [ not to mention ] with two or more users Each user 
makes each user know whom the machine is speaking to according to the wording of equipment, or the difference of the 
tune of voice. Or felicity is sensed for this user with the tune of the wording which this user was made to have a sense of 
closeness over a machine with the tune of the wording towards a specific user, or voice, or was accordant to a specific 
user's psychic distance, or voice, and it can cheat. 

[0018] Moreover, a user's age / sex / psychic distance m*e presumed, and it also becomes possible to also change the 
contents which a machine utters according to this presumed result. Therefore, each user can sense and have felicity in the 
tune of the wording which could know whom the machine would speak to according to the wording of equipment, or the 
difference of the tune of voice, and has had a sense of closeness to a machine by the ability addressing the contents for 
oneself at the tune of the wording and the voice only suitable for oneself, or was accordant to psychic distance with self, 
or voice, and can get a good feeling of use. 

[0019] Again. In this invention, it makes it possible to make different characterization for every user interface perform 
easily by [ of the above-mentioned wording or the tune of voice ] changing and changing the inclination of the direction 
variously. 

[0020] Therefore, for an equipment management person, it becomes possible by the above-mentioned wording and the 
tune of voice changing and changing the inclination of the direction variously to characterize every equipment 
systematically. 

[0021] In addition, invention concerning each equipment [ more than ] is materialized also as invention concerning an 
approach, and invention concerning an approach is materialized also as invention concerning equipment. Moreover, the 
above-mentioned invention is materialized also as a medium which recorded the program for making a computer perform 
a corresponding procedure or a corresponding means and in which machine read is possible. 
[0022] 

[Embodiment of the Invention] Hereafter, the gestalt of implementation of invention is explained, referring to a drawing. 
Below, as the voice synthesis equipment which applied this invention, or an voice synthesis function, for example, 
equipment equipped with a user interface which performs a dialogue with human being who made voice the subject, or 
the active voice synthesis for human being, a multi-modal interface, etc., for example, information offer equipment. This 
operation gestalt is explained taking the case of the thing supposing what (or let the voice synthesis equipment which 
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applied this invention be information offer equipment, agent equipment, active advertising equipment, etc.) is mounted in 
agent equipment, active advertising equipment, etc. 

10023] This voice synthesis equipment or an voice synthesis function can be realized using software, and the various 
equipments or service which carried the voice synthesis function concerning this invention can be realized by mounting 
the program which realized this voice synthesis equipment or an voice synthesis function in a computer or various 
equipments. 

[0024] First, the basic configuration of the voice synthesis equipment concerning 1 operation gestalt of this invention is 
explained. The basic configuration of the voice synthesis equipment applied to this operation gestalt at drawing 1 is 
shown. This voice synthesis equipment is equipped with the user description presumption section 1, the contents decision 
section 2 of voice synthesis, the voice synthesis parameter decision section 3, and the voice synthesis section 4. 
[0025] If the user description presumption section 1 is required, it distinguishes "the person identification information 
(the identification information which shows a strange person is included)" showing the "person positional information" 
showing each user's location, each user's identifier, etc., while it detects a user separately and presumes "person attribute 
information", such as sex / age / psychic distance, for every user. 

[0026] The contents decision section 2 of voice synthesis generates the "voice synthesis candidate information" and the 
"contents information of voice synthesis" that it means what is told to which user based on all each user's person 
identification information and person attribute information which were distinguished by the user description presumption 
section 1, person positional information, or its part. 

[0027] The voice synthesis parameter decision section 3 computes the voice synthesis parameter information which 
controls the "wording" and/or "the tune of voice" according to person attribute information of the voice synthesis 
candidate presumed by the user description presumption section 1. 

[0028] The voice synthesis section 4 if according to the voice synthesis candidate information determined in the contents 
decision section 2 of voice synthesis ~ the face sense of an agent image, and a look (case the amount of [ about bodies of 
equipment equivalent to the doll which accompanies equipment, or this, such as a thing, the display screen, etc. ] moving 
part is electronically controllable ~ face sense, such as a doll, - a look) Or while controlling the confrontation directions, 
such as a body of equipment, and the display screen, etc., according to the voice synthesis parameter information 
determined in the voice synthesis parameter decision section 3, the voice synthesis output of the contents determined in 
the contents decision section 2 of voice synthesis is carried out. 

[0029] Next, the fundamental processing in the voice synthesis equipment concerning this operation gestalt is explained. 
An example of the flow of the processing in the voice synthesis equipment applied to this operation gestalt at drawing 2 
is shown. In the user description presumption processing of step SI, while presuming person attribute information, such 
as sex / age / psychic distance, for every user, if , the person identification information (the identification information 
which shows a strange person is included) showing the person positional information showing each user's location, each 
user's identifier, etc. is distinguished. 

[0030] In the contents decision processing of voice synthesis of step S2, the voice synthesis candidate information and 
the contents information of voice synthesis that it means what is told to which user based on all each user's person 
identification information and person attribute information which were distinguished at step SI, person positional 
information, or its part are generated, 

[0031] The voice synthesis parameter information which controls by voice synthesis parameter decision processing of 
step S3 the tune of the wording and/or the voice according to an voice synthesis candidate's person attribute information 
presumed at step SI is computed. 

[0032] In voice synthesis processing of step S4 if according to the voice synthesis candidate information determined at 
step S2 ~ the face sense of an agent image, and a look (case the amount of [ about bodies of equipment equivalent to the 
doll which accompanies equipment, or this, such as a thing, the display screen etc. ] moving part is electronically 
coiitrollable - face sense, such as a doll, — a look) Or while controlling the confrontation directions, such as a body of 
equipment, and the display screen, etc., according to the voice synthesis parameter information determined at step S3, the 
voice synthesis output of the contents determined at step S2 is carried out. 

[0033] Below, it explains using the more detailed example of a configuration in order of the user description presumption 
section 1, the contents decision section 2 of voice synthesis, the voice synthesis parameter decision section 3, and the 
voice synthesis section 4. First, it explains in more detail about the user description presumption section 1. 
[0034] The example of a configuration of the user description presumption section 1 is shown in drawing 3 . As shown in 
drawing 3 , this user description presumption section 1 is equipped with the person detection dictionary II, the person 
detecting element 12, the person collating dictionary 13, the person collating section 14, the psychic distance presumption 
section 15, the sex collating dictionary 16, the sex presumption section 17, the age collating dictionary 18, and the age 
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presumption section 19. 

[0035] The person detection dictionary 1 1 stores the template for person detection. The person detecting element 12 
inputs the image of the space which faces the user of the front face of equipment, and it outputs a person's detected image 
part as a person field image while detecting a fully similar part with the above-mentioned template for person detection 
out of this image and outputting as person positional information. 

[0036] The person collating dictionary 13 stores the template for person collating for every existing acquaintance object 
for collating an individual. The person collating section 14 collates with the above-mentioned template for person 
collating the person field image obtained by the person detecting element 12, chooses the highest and a fully similar 
template, and outputs the person identification information matched with this template. In addition, the person collating 
section 14 generates the person identification information which shows that he is a strange person, when a fully similar 
template is not able to be discovered, as a result of collating a person field image with the template for person collating. 
[0037] The psychic distance presumption section 15 computes and outputs the psychic distance according to the this 
updated equipment use count while updating the equipment use count of the equipment concerned for every existing 
acquaintance object based on the above-mentioned person identification information. This psychic distance is a value 
which shows which was familiar with equipment in this case, and is the amount which carries out monotone reduction to 
an equipment use count, for example, the amount in inverse proportion to an equipment use count. 
[0038] The sex collating dictionary 16 stores the template for sex collating. The sex presumption section 17 chooses the 
template which collates with the above-mentioned template for sex collating the person field image obtained by the 
person detecting element 12, and is similar to the highest, and outputs the sex matched with this template. 
[0039] The age collating dictionary 18 stores the template for age collating. The age presumption section 19 chooses the 
template which collates with the above-mentioned template for age collating the person field image obtained by the 
person detecting element 12, and is similar to the highest, and outputs the age matched with this template. 
[0040] In addition, like the person collating section 14, the sex presumption section 17 outputs the sex matched with this 
template, only when the highest and a fully similar template are able to be discovered, and when a fully similar template 
is not able to be discovered, you may make it output the information which shows that it is a sex indeterminate. The age 
presumption section 19 outputs the age matched with this template, only when the highest and a fully similar template are 
able to be discovered, and when a fully similar template is not able to be discovered, you may make it similarly output 
the information which shows that it is an age indeterminate. 

[0041] In addition, they are the psychic distance which the psychic distance presumption section 15 outputs, the sex 
which the sex presumption section 17 outputs, and the age, i.e., person attribute information, which the age presumption 
section 19 outputs. Moreover, the template stored in the 16/age collating dictionary 18 of 13/sex collating dictionaries of 
person detection dictionaries of 11/person collating dictionaries is the information expressing the description of each 
category for which every [ which corresponds respectively ] category (according to the /man and woman classified by 
human being/individual / age) is statistically asked from the former image by which number acquisition was carried out 
enough. 

[0042] Next, it explains in more detail about the contents decision section 2 of voice synthesis. The example of a 
configuration of the contents decision section 2 of voice synthesis is shown in drawing 4 . As shown in drawing 4 , this 
contents decision section 2 of voice synthesis consists of the control regulation database 21, the voice synthesis control 
section 22, a selection-rule database 23, a subject database 24, and the contents selection section 25 of voice synthesis. 
[0043] The control regulation database (control regulation DB) 21 stores the control regulation for controlling the 
condition of voice synthesis. The voice synthesis control section 22 generates the contents selection command which 
controls the condition of voice synthesis according to the above-mentioned control regulation. A contents selection 
command expresses comparatively rough directions of the contents which should speak [ describe / "greet Mr. 00", / 
"describe main subject" ]. 

[0044] The selection-rule database (selection rule DB) 23 stores the selection rule for determining the contents which 
should speak. The subject database (subject DB) 24 stores the subject which should speak. 

[0045] The contents selection section 25 of voice synthesis generates the voice synthesis candidate information which 
shows about which user it speaks as a target while reading the contents corresponding to the contents selection command 
which the voice synthesis control section 22 outputs from subject DB24 based on the selection rule in which it is stored 
by selection rule DB23. 

[0046] Here, the control regulation stored in the control regulation DB21 is explained. If a control regulation is divided 
roughly, it will consist of two kinds, voice synthesis formation rule and the regulation for voice synthesis. The former is a 
regulation which manages advance of voice synthesis and the latter is a regulation which manages an voice synthesis 
candidate's decision. 
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[0047] First, voice synthesis formation rule are explained in detail, voice synthesis formation rule are regulations to 
which the condition of voice synthesis is made to change after the equipment concerned detects a user first until it 
finishes speaking predetermined subject, and lay down the voice synthesis sequence of a greeting or main subject. 
[0048] An example of voice synthesis formation rule is shown in drawing 5 . In the voice synthesis formation rule 
illustrated to drawing 5 , it awaits, and from a condition, a user is detected, it is not rich, an initiation greeting ("hello" 
etc.) is uttered, and then the sequential voice synthesis of the predetermined main subject (some promotion, self- 
introduction, etc.) is carried out to this user (or a user's ensemble), voice synthesis in the main question is repeatedly 
performed until the main subject which should talk becomes that there is nothing in **. If it finishes speaking about all 
main subject, a termination greeting ("thank you very much" etc.) will be uttered, and voice synthesis will be completed. 
If a user stops there being on the way, voice synthesis in the main question is stopped at the time, it will be uttered ("is it 
busy?"), an interruption greeting will await, and it will change in the condition. 

[0049] Then, the regulation for voice synthesis is explained in detail, the regulation for an voice synthesis be a regulation 
which determine how the partner ( voice synthesis candidate ) who address in the midst of the voice synthesis which 
voice synthesis formation rule direct be change , be a clause unit or a sentence unit , and according to a candidate location 
and attribute information , it be use in order to change a look / face sense , such as an agent image , the tune of the 
wording/voice in the case of an voice synthesis , etc. 

[0050] Two of round regulations are prepared for the regulation for voice synthesis a round regulation and the middle 
stage in which it is applied in the middle of the voice synthesis which continues after that at the time of the initiation 
applied at the time of voice synthesis initiation. For example, at the time of the initiation illustrated next, like a round 
regulation, it greets sequentially from the person located most in near, and the last is patrolled at random. When there is a 
child, the candidate for voice synthesis is made to patrol from a child preferentially. 

[0051] An example of the initiation round regulation of the regulations for voice synthesis is shown in drawing 6 . At the 
time of the initiation illustrated to drawing 6 , under a round regulation, when there is a two parents-and-children 
companion's user, it greets as "hello and me" to a child first, and, subsequently it is introduced to parents, saying, "I am 
XX." In addition, the round is made into the sentence unit in this example. Although the number of greeting sentences is 
free, even if it finishes saying a greeting sentence, when there is a non-gone round round user further, application of a 
round regulation is ended at the time of initiation, and it goes into main subject. 

[0052] Now, if it goes into main subject, a round regulation will be applied in the middle stage. An example of a round 
regulation is shown in drawing 7 in the middle stage among the regulations for voice synthesis. Under a round regulation, 
it addresses preferentially to the person who suits subject, and a person eager for subject the middle stage in which it 
illustrated to drawing 7 . Since there is a category (sex, age) of the person who was most suitable for subject respectively, 
this person's goodness of fit (the suiting number of attributes) is calculated by how many suit subject among a certain 
person's attribute information. Moreover, when a certain person tries positive approach to the equipment concerned or has 
turned the face to the equipment concerned, whenever [ eager ] (0/1) is calculated by judging that this person is an eager 
person. Next, a weighted sum a goodness of fit and whenever eager is made into a round priority, and a candidate is 
patrolled according to this round priority. The time amount for one person is about dozens of seconds from several 
seconds, and is proportional to a round priority. 

[0053] Next, the selection rule stored in selection rule DB23 are explained. Selection rule are regulations for choosing the 
contents of voice synthesis according to a contents selection command. An example of selection rule is shown in drawing 
8, 

[0054] In the selection rule illustrated to drawing 8 , although contents with an initiation greeting, a termination greeting, 
and an interruption greeting common to an adult/child are chosen, about main subject, contents which are different by the 
adult and the child are chosen. In addition, sex and distinction by psychic distance are also considered as selection rule in 
the main question. 

[0055] Next, the information stored in subject DB24 is explained. The contents of subject DB24 are the texts described in 
standard wording. The contents of the subject DB24 at the time of applying the voice synthesis equipment concerning 
this operation gestalt as active advertising equipment are illustrated below. 
[0056] [The example of subject] 

object = - existing acquaintance object: - classification = initiation greeting: - sentence = ~ "-- hello ~ Mr. OO - " 
object = - strange - person:classification = initiation greeting: sentence = - "how do you do. I am xx. " 
object = - all category: — classification = termination greeting: ~ sentence = "thank you very much" 
object = - all category: -- classification = interruption greeting: ~ sentence = - "- is it busy? Let's meet you again. " 
object = adult: - classification = main subject: - sentence = "now, recently -" (guidance of traveling abroad) 
object = adult: - classification = main subject: - sentence = "it is - most now" (guidance of a new work movie) 
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object = child: - classification = main subject: sentence = "next time (guidance of a new work game) 
object = adult: - classification = main subject: sentence = "it is - most now" (guidance of a new work movie) 
object = all category: ~ classification = main subject: sentence = "new attraction -" (guidance of a theme park) 
object = adult and female: - classification = main subject: - sentence = "this fresh color of autumn ~" (guidance of 
cosmetics) 

Each subject consists of the candidate list which specifies a candidate's category, identification information which 
specifies the classification of subjects, such as a greeting and main subject, and an actual sentence. The contents selection 
section 25 of voice synthesis searches the subject which contains in a candidate list the candidate of the subject defined 
according to application of selection rule, and it works so that the subject which has not completed voice synthesis may 
be selected one by one. The sentence indicated by the selected subject is the contents information of voice synthesis 
which the contents decision section 2 of voice synthesis outputs. 

[0057] however, the sentence which is used as an object = existing acquaintance object in the case of a strange person, for 

example, an object, ~ = existing acquaintance object:classification = initiation greeting: sentence = - "— hello ~ Mr. 00 - 
tt 

****** is made improper. 

[0058] In addition, when supplementary data (catalog photograph etc.) are linked to a proper place and a link part is 
uttered in the sentence of subject DB24, the corresponding supplementary data may be made to be shown to coincidence. 
Moreover, the information which specifies body actuation of the agent at that time is also added to each sentence, for 
example, these data point at the time of salutation actuation and supplementary data presentation at the time of a greeting, 
and an agent may enable it to perform actuation etc. 

[0059] Next, it explains in more detail about the voice synthesis parameter decision section 3. The example of a 
configuration of the voice synthesis parameter decision section 3 is shown in drawing 9 . As shown in drawing 9 , this 
voice synthesis parameter decision section 3 consists of the standard property database 31, the canonical parameter 
setting section 32, an individual property database 33, and the individual parameter setup section 34. 
[0060] The standard property database (standard property DB) 31 stores the standard adjustment set for determining a 
standard voice synthesis parameter according to a candidate's category. The canonical parameter setting section 32 
determines standard voice synthesis parameter information based on the person attribute information which the user 
description presumption section 1 outputs, and the above-mentioned standard adjustment set. 

[0061] The individual property database (individual property DB) 33 stores the individual property for expressing the 
character of equipment each, the individual parameter setup section 34 receives the canonical parameter information 
which the canonical parameter setting section 32 outputs, and is individual to this - the voice synthesis parameter 
information that the deviation by the individual property stored in DB33 was added is generated. 
[0062] The standard property DB31 has one fundamental parameter set, and the standard adjustment set for every 
category of the candidate who specified the amount of adjustments to this is stored. Specifically, the standard adjustment 
set for children, the standard adjustment set for women, etc. are stored. The canonical parameter setting section 32 reads 
the standard adjustment set which corresponds from a candidate's given category, adjusts a fundamental parameter set 
based on this, and generates the canonical parameter according to a category. 
[0063] The example of a fundamental parameter set and a standard adjustment set is shown below, 
[fundamental parameter set] whenever [ flank / of wording ] - =50 (certified value) whenever [ tenderness / of the tune of 
voice ] " =50 (certified value) Speed =50 about which it speaks (certified value) [Standard adjustment set for children] 
Whenever [ flank / of wording ] — +50 (it adjusts to relaxed and informal sensibility extremely) whenever [ tenderness / 
of the tune of voice ] +50 (it adjusts to sensibility very gentle to) Speed about which it speaks -30 (it adjusts to the 
sensibility carried out slowly) [Standard adjustment set for adults] Whenever [ flank / of wording ] **0 (certified value 
maintenance) whenever [ tenderness / of the tune of voice ] **0 (certified value maintenance) speed about which speak 
**0 (certified value maintenance) [the standard adjustment set for old men] Whenever [ flank / of wording ] **0 
(certified value maintenance) Whenever [ tenderness / of the tune of voice ] +20 (it adjusts to sensibility a little gentle to) 
Speed about which it speaks -30 (it adjusts to the sensibility carried out slowly) [the near standard adjustment set for 
persons of psychic distance] whenever [ flank / of wording ] +50 (it adjusts to relaxed and informal sensibility extremely) 
Whenever [ tenderness / of the tune of voice ] **0 (certified value riiaintenance) Speed about which it speaks **0 
(certified value maintenance) [Far standard adjustment set for persons of psychic distance] Whenever [ flank / of 
wording ] () -50 It adjusts to the sensibility were extremely changed. Whenever [ tenderness / of the tune of voice ] **0 
(certified value maintenance) Speed about which it speaks **0 (certified value maintenance) These are the standard 
wording decided for every category of the candidate also common to which equipment, and a regulation which sets up 
the tune of voice. It changes into the wording which was able to give the individuality for every equipment for this by the 
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individual parameter setup section 34 of the next step, and the parameter which can generate the tune of voice. The 
individual property for attaching this individuality is stored in the individual property DB33. 
[0064] The example of an individual property is shown below. 

[Individual property common to a category] (as opposed to whom) Whenever [ flank / of wording ] +10 it is **** 
inclination to talk by beam sensibility (as opposed to whom) Whenever [ tenderness / of the tune of voice ] +10 ** - 
inclination to talk gently Speed, about which it speaks -20 (inclination about which it speaks slowly to anyone) [category 

- individual property] of a proper if (a candidate is a child) then Tenderness of the tune of voice +10 (inclination 
especially gentle to a child) 

An individual property common to a category is the individuality of the equipment independent of a candidate's category, 
and means talking with the same inchnation to anyone. The individual property of a proper demonstrates effect to a 
candidate's specific category like equipment gentle to a child in a category. Thus, it becomes possible to direct the 
individuality of equipment by the common framework by establishing an individual property. 

[0065] Next, it explains in more detail about the voice synthesis section 4. The example of a configuration of the voice 
synthesis section 4 is shown in drawing 10 . As shown in drawing 10 , this voice synthesis section 4 consists of the 
sentence transformation-rule database 41, an voice synthesis sentence transducer 42, the acoustic-sense information 
generation section 43, and the vision information generation section 44. 

[0066] The sentence transformation-rule database (sentence transformation rule DB) 41 stores the regulation for changing 
into the sentence of predetermined wording the contents information of voice synthesis (it having been described by 
standard wording reading out sentence) which the contents decision section 2 of voice synthesis outputs. 
[0067] It is changed into the sentence of predetermined wording, the voice synthesis sentence transducer 42 referring to 
the above-mentioned sentence transformation rule DB41 according to the voice synthesis parameter with which the 
contents decision section 2 of voice synthesis outputs the above-mentioned contents information of voice synthesis. 
[0068] The acoustic-sense information generation section 43 carries out [ voice / actual ] the conversion output of the 
voice synthesis sentence information which the voice synthesis sentence transducer 42 outputs according to the above- 
mentioned voice synthesis parameter, the vision information generation section 44 generate the face sense / look 
according to voice synthesis candidate information , and the agent image which be able to give the actuation according to 
the additional information of the contents information of an voice synthesis ( or - electronic - a controllable doll etc. - 
control ) , and if required , it convert into a video signal in coincidence the supplementary data linked to the contents 
information of an voice synthesis . 

[0069] The acoustic-sense information generation section 43 delivers the synchronization information corresponding to 
this voice to the vision information generation section 44 while carrying out [ voice ] a generation output. The vision 
information generation section 44 which received synchronization information and was passed opens and closes an 
agent's opening according to this synchronization information. 

[0070] In addition, the relation between voice synthesis candidate information, voice synthesis parameter information, the 
contents information of voice synthesis, and the voice synthesis sentence information changed based on them is as 
follows, for example at this time, 
[The example of voice synthesis candidate information] 

Name = Suzuki scent sex = male age =8 years-old psychic distance = 60 locations =xx [the example of voice synthesis 
parameter information] 

Tenderness of the tune of degree =of flank 100 voice of wording = example] of the speed =40[voice synthesis sentence 

information (before conversion) about which it speaks 100 

"hello - Mr. 00. 1 introduce -- in which a new attraction was substantial. " 

[The example of voice synthesis sentence information (after conversion)] 

"hello - the scent. Does a new attraction introduce full ~? " 

In order to perform this, the following transformation rules are described by the sentence transformation rule DB41. to 
transformation rule, it illustrates below ~ as - a name substitution regulation and the sentence end - there are 
transformation rule and a paraphrasing regulation. An example of a name substitution regulation is shown in drawing 11 . 
drawing 12 - the sentence end - an example of transformation rule is shown. It is put in another way as drawing 13 , and 
an example of a regulation is shown. 

[0071] The above-mentioned name substitution regulation is a regulation for substituting a name suitable for this part, 
when a candidate's etc. identifier must be put into the contents information of voice synthesis, moreover, the sentence end 

- transformation rule are regulations for deflecting the sentence end appropriately according to whenever [ flank ]. A 
paraphrasing regulation is a regulation for transposing a word difficult especially for a child to a plain word. 

[0072] the sentence end ~ it must be put in another way as transformation rule, and a regulation must inspect whenever 
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[ flank / of each word used in the text (the contents information of voice synthesis) ], and difficulty on the occasion of the 
application. An allowed value list is used for this inspection. An allowed value list is information which specifies the 
applicability of whenever [ flank / of each word ], or difficulty. The word which has deviated from this range must be 
transposed to another word from which it does not deviate. A synonym list is used for this replacement. A synonym list is 
the information which sunmiarized the synonym of each word. The word of the text which deviated from the allowed 
value is searched from this synonym list, and another word from which it does not deviate is replaced. When two or more 
synonyms which can be replaced exist, one is decided and replaced at random. It avoids that the same word appears by 
doing in this way repeatedly in a sentence. 

[0073] The example of an allowed value list and a synonym list is shown below. In addition, an allowed value list and a 
synonym list are stored in the sentence transformation rule DB41 with sentence transformation rule. 
[The example of an allowed value list] 

such as "it carries out", whether "whether it to carry out", "it carries out", and "not carrying out" flank whenever > - 
:80 >= flank whenever >=, such as 80 "it carries out" and "not carrying out", - 20 "I will carry out", "I doing", and "I not 
doing" - "-- " (prefix) - etc. - whenever [ :20> flank ] - "-- :difficulty >, such as fullness", - :50 >= difficulty, such as 
that it is 50 "full" and "it being full", -- "-- how do you do :difficulty >, such as ", :50, such as 50 "hello and me", >= 
difficulty [the example of a synonym list] 

The voice generation section 43, such as "it carries out", whether "whether it to carry out", "it carries out", "it carrying 
out", "1 carrying out", "I doing", etc. "it not carrying out" "it not carrying out", and "not doing", changes into voice the 
voice synthesis sentence information generated as mentioned above with the tune and speed of voice which an voice 
synthesis parameter specifies, each sentence end - the information on taking up and down of the ending is attached to the 
word. For example, the ending of "whether to carry out" goes up and the ending of "carrying out" falls, in addition, when 
the tune of voice is gentle, the dynamic range of speech power is narrowed and is attached ~ not being audible - making 
- further - the sentence end - the processing which extends the ending of a word is added and tenderness is expressed. 
[0074] In addition, voice synthesis equipment or the voice synthesis approach concerning this invention are not limited to 
the above operation gestalt. For example, the contents of voice synthesis are not changed according to a user's category, 
but you may make it utter the predetermined contents at the tune of the wording and the voice according to a user. 
[0075] Moreover, it is good also as adjustment being variously possible by the actuation from the outside as the 
equipment exterior to exchange of the various information stored in the person detection dictionary 1 1, the person 
collating dictionary 13, the sex collating dictionary 16, the age collating dictionary 18, the control regulation DB21, 
selection rule DB23, subject DB24, the standard property DB31, the individual property DB33, and the sentence 
transformation rule DB41 being possible. 

[0076] Moreover, the user description presumption section 1 is possible also for independent or using for coincidence and 
presuming person identification information, person positional information, and person attribute information in other 
information, such as not only an image but voice. 

[0077] Moreover, it is good also as possible in the output which changed wording not in voice but in the alphabetic 
character. By the way, it is also possible to apply the information which recorded on the record medium 51 and this 
recorded the information (for example, program) which realizes the voice synthesis equipment concerning this invention 
and the voice synthesis approach as shown in drawing 14 to equipment 53 via a communication line in to apply to 
equipment 52 **** via this record medium 51. This invention is not limited to the gestalt of operation mentioned above, 
in the technical range, can deform variously and can be carried out. 
[0078] 

[Effect of the Invention] When equipment faces two or more users at coincidence according to this invention, each user 
can know whom the machine is speaking to according to the wording of equipment, or the difference of the tune of voice. 
By the ability addressing the contents for oneself at the tune of the wording and the voice only suitable for oneself, it can 
sense and have felicity in the tune of the wording which has had a sense of closeness to a machine, or was accordant to 
psychic distance with self, or voice, and a good feeling of use Can be obtained. 

[0079] Moreover, for an equipment management person, it becomes possible by the above-mentioned wording and the 
tune of voice changing and changing the inclination of the direction variously to characterize every equipment 
systematically. 



[Translation done.] 
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